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Query answering under existential rules — implications with existential quanti¬ 
fiers in the head — is known to be decidable when imposing restrictions on the 
rule bodies such as frontier-guardedness [BLM10, BLMS11]. Query answering is 
also decidable for description logics [Baa03], which further allow disjunction and 
functionality constraints (assert that certain relations are functions); however, they 
are focused on ER-type schemas, where relations have arity two. 

This work investigates how to get the best of both worlds: having decidable 
existential rules on arbitrary arity relations, while allowing rich description logics, 
including functionality constraints, on arity-two relations. We first show negative 
results on combining such decidable languages. Second, we introduce an expressive 
set of existential rules (frontier-one rules with a certain restriction) which can be 
combined with powerful constraints on arity-two relations (e.g. GC 2 , ACCQfLb) 
while retaining decidable query answering. Further, we provide conditions to add 
functionality constraints on the higher-arity relations. 


1. Introduction 

Recent years have seen an explosion of techniques for solving the query answering problem,'. 
given a query q, a conjunction F of atoms, and a set of logical constraints E, determine 
whether q follows from F and E. In databases this is called querying under constraints or 
the certain answer problem , seeing F as an incomplete database, and E as restrictions on the 
possible completions. For researchers working on description logics, F is referred to as the 
A-box and E the T-box. In both communities q is usually a conjunctive query , an existential 
quantification of conjunctions of atoms, equivalent to a basic SQL SELECT. We will make 
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this assumption throughout this work, referring for simplicity to the problem as just “query 
answering” (QA). 

QA is undecidable when £ ranges over arbitrary first-order logic constraints. This motivates 
the search for restricted constraint languages with decidable QA. Within the description logic 
community, powerful such languages were developed to express constraints on vocabularies of 
arity two. The unary relations are referred to as concepts while the binary ones are the roles. 
The languages can build new concepts and roles from basic ones via Boolean operations and 
(limited) quantification, and many of them, such as DL-Lite [CDGL + 05] or ACCQlb [TobOl], 
may restrict the input roles R(x, y ) to be functional - for all x there is at most one y such that 
R(x, y ). Functionality constraints are crucial to faithfully model many real-world relationships: 
the relationship of a person to their birthdate, the relationship of an event to its starting time, 
etc. Hence, description logics are very powerful languages for arity-two vocabularies. 

In parallel, the AI and database communities have developed rich constraint languages on 
arbitrary arity via existential rules or tuple-generating dependencies (TGDs). Existential rules 
are constraints of the form Vx ((f) (x) —>• 3y if’(x', y)) where x' C x and <f and if are conjunctions 
of atoms. They generalize the well-known inclusion dependencies or referential constraints 
in databases [AHV95], and can also express mapping relationships used in data exchange 
[FKMP05] and data integration [Len02j. Although QA over general rules is undecidable, 
important subclasses are decidable. First, decidability holds whenever the chase procedure 
[AHV95] is guaranteed to terminate, which is ensured by a number of conditions on the rules, 
e.g., weak acyclicity [FKMP05], joint acyclicity [KR11], or the very restricted class of source- 
to-target TGDs. See [GHK+13] for a survey and [BGMR14] for a recent study. A second class 
of tame constraints are those that admit bounded-treewidth models. There are several such 
classes, such as guarded TGDs [CGL12], frontier-guarded TGDs [BLM10], or the more general 
greedy bounded-treewidth sets [BMRT11]. However, many features of description logics, such 
as disjunction or functionality restrictions, cannot be expressed by existential rules. 

Could we then enjoy the best of both worlds, by allowing both description logic constraints 
and existential rules, while maintaining the decidability of QA? This paper studies to what 
extent both paradigms can be combined, by looking for classes of constraints with decidable 
QA over relational schemas of arbitrary arity that can 1. express non-trivial existential rules 
over any relation in the schema and 2. assert expressive constraints (e.g., in ACCQfLb) on the 
arity-two subschema — the subset of the relations of arity one and two within the schema 

Our first results (Section 3) are negative: we show that arity-two languages featuring func¬ 
tionality constraints on the arity-two subschema may lead to undecidable QA when combined 
with even very simple acyclic rules (source-to-target TGDs, S2T), or with the simplest ex¬ 
istential rules that export two variables (frontier-two inclusion dependencies, ID[2]). More 
surprisingly, undecidability can occur with rules exporting only a single variable, the class of 
frontier-one dependencies FR[1] of [BLMS09]. We say the existential rule languages S2T, ID[2], 
FR[1] are destructive of arity-two QA. 

We then show (Section 4) that by restricting FR[1] slightly, imposing that the head of the 
rules have a certain tree shape (denoted “non-looping”), we can obtain a class of existential 
rules that can be combined with expressive constraints on the arity-two schema while main¬ 
taining decidable QA (we call this not destructive). The reduction proceeds in two steps. We 
first handle rules with tree-shaped bodies, via a direct rewriting technique to constraints on 
an arity-two encoding of the schema. Second, we handle rules with non-tree-shaped bodies, 
showing that the bodies can be soundly replaced by a tree-shaped approximation. Soundness 
is proven by extending the technique of “treeification” used previously in many modal and 
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guarded logics (e.g., [BG014]), showing that models of the constraints can be “unraveled” to 
be tree-shaped. 

We go on to study (Section 5) the addition of functional dependencies (FDs), a well-known 
generalization of description logic functionality constraints to arbitrary arity. QA with existen¬ 
tial rules and FDs is generally undecidable unless their interaction with the existential rules is 
controlled, e.g., by imposing the non-conflicting condition [CGP12]. We show that FDs can be 
added to our existential rules while maintaining decidable QA with the arity-two constraints, 
as long as the non-conflicting condition is satisfied. As in the standard non-conflicting setting, 
we show that the FDs can always be satisfied unless the initial facts violate them. We prove 
this by modifying the unraveling argument. 

Our results have the advantage that QA for our combined constraints reduces to QA on an 
arity-two schema; hence, existing QA algorithms for rich description logics could be extended 
to arbitrary arity signatures with expressive constraints. 

Related work. A great deal of research has centered around the integration of DLs with 
Datalog-style rules, including work as early as the 1990’s, when the languages AL-Log [DLNS91] 
and CARIN [LR98] were introduced. AL-Log links Horn rules with concepts from a description 
logic terminology, while the later language CARIN provides a broader framework allowing both 
concepts and roles from a terminology to appear in rules. [LR98] provides both entailment 
algorithms for CARIN and undecidability results exploring the borderline for combining rules 
and DLs. 

Datalog rules, however, unlike the existential rules that we consider in this work, do not 
allow existential quantification in the head, so they cannot assert the existence of higher-arity 
facts on fresh elements. 

Another approach to combination are description logics that support higher-arity relations 
directly. Languages such as V£lZ reg [CGL08] give some support for higher arity while retaining 
a DL-style syntax. Unlike them, we support existential rules with cyclic bodies that cannot be 
encoded in VC1Z reg , as well as arbitrary higher-arity functional dependencies that go beyond 
DL-expressible functionality assertions. On the other hand, we do not support some features 
of VCTZreg, such as regular expression on role paths. Indeed, we do not consider the interaction 
of rules with DLs supporting transitivity and other recursion mechanisms [GLHS08], focusing 
instead only on first-order-expressible constraints given by decidable DLs and existential rules. 

2. Preliminaries 

Signatures, facts, queries. A signature a consists of relation names (e.g. R) and an associ¬ 
ated arity (e.g. \R\). We write a as cr <2 U <r> 2 , containing respectively the relations of arity 
< 2 and the higher-arity relations with arity > 2. An atom R{x) consists of a relation name R 
and an |i?|-tuple x of variables. A a-fact (or just fact when a is clear from context) is a con¬ 
junction of atoms using relations in a. A Boolean conjunctive query (or CQ) is an existentially 
quantified conjunction of atoms. In this paper we assume for simplicity that CQs are Boolean, 
i.e., have no free variables, and we disallow constants. This is without loss of generality: for 
non-Boolean queries we can enumerate all possible assignments, and constants can be encoded 
with fresh unary relations. 
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Constraints, QA. We consider constraints that are formulae in function-free and constant- 
free first-order logic (FO), on the signature a. A a-interpretation X (or just interpretation) 
consists of a domain dom(X) and an interpretation function ■ 1 mapping each relation R of a 
to a set R 1 of |i?|-tuples of dorn(Z). The definition of 1 satisfying a FO formula <f, written 
X (= cf>, is standard. A witness W of F in X is an interpretation that maps each relation R to 
the tuples in R x obtained by substituting the atoms of F using some variable binding w such 
that X 1= F{ w). 

We study the query answering problem (QA): given a fact F, a set of constraints £, and a 
CQ q, decide the validity of Vx (F(x) A £ —> q)\ that is, whether F and £ entail q. In this case, 
we write F AS |= q. The combined complexity of QA, for a fixed class of constraints, is the 
complexity of deciding it when all of F, £ (in the constraint class) and q are given as input. 
If we assume that £ and q are fixed, and only F is given as input, then we define instead the 
data complexity. 

The QA problem above allows arbitrary FO constraint classes. Below we present two kinds 
of integrity constraints that are known to enjoy decidable QA. 

Existential rules. An existential rule (or tuple-generating dependency , or TGD) is a logical 
constraint of the form Vx —>• 3y if(x', y)), with x.' C x, where the body <f and head if are 

conjunctions of atoms. Equality atoms and constants are disallowed. For brevity, in rules we 
often omit the quantification on x and write ‘A’ as a comma. A rule is single-head if its head 
consists of only one atom. 

QA is undecidable for general rules (following from [BV81]). One class of rules with decidable 
QA are those satisfying acyclicity conditions. We will show negative results for one of the most 
restrictive classes, the class S2T of source-to-target TGDs, where a is partitioned as a = ctsUctt, 
the bodies of all rules only use relations in as, and the heads only use relations in ax- Our 
results on S2T extend to more permissive acyclicity conditions, such as those mentioned in the 
introduction. 

A second class of decidable rules guarantees that it suffices to consider bounded-treewidth 
interpretations, usually because of constraints on the rule bodies. We focus on the class F R [1] 
of frontier-one rules , following [BLMS09]: the frontier of a rule is the set x' of variables that 
occur both in the body and the head, and a rule is frontier-one if |x'| = 1. The class of inclusion 
dependencies ID imposes that the head and body are single atoms where each variable is used 
only once and that the frontier is not empty, and we will focus on the class ID [2] of the inclusion 
dependencies with frontier size 2. QA is decidable for FR[1] [BLMS09]. For ID it is decidable 
and has PTIME data complexity [CLR03b]. 

Existential rules can be augmented with functional dependencies (FDs), which are variants 
of existential rules that impose equalities. Writing Vx = Vaq • • • Vx n and similarly for y, an FD 
on the relation R is of the form: 

Vxy (R(x i,... ,x n ) A R(y i,... ,y n ) A A i &L x l = Vl) x r = y r 

for some 1 < r < \R\ and some subset L C {1,..., |i?|} which we call the determiner of the 
FD. QA is undecidable when combining existential rules and arbitrary FDs, for instance it is 
undecidable for ID[2] and FDs [CLR03a]. 

Arity-two constraints. The second kind of tame constraints are arity-two constraints, which 
are only defined on a< 2 - The most general such language that we study is the two-variable 
guarded fragment with counting quantifiers, GC 2 [Kaz04]. GC 2 is the smallest class of constant- 
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free FO formulae with at most two variables, containing all atoms for cr <2 relations, closed 
under Boolean connectives, under guarded universal and existential quantification, and under 
number quantifications: if <j>(x,y) is a GC 2 formula and A(x,y) is an arity-two atom with two 
free variables (the guard), then 3- n y A(x, y) A <f>{x, y) and 3 <n y A(x, y) A <f{x, y) are formulae, 
where n is an integer. QA for GC 2 is decidable and its data complexity is in co-NP [PH09]. 

Description logics (DLs) are arity-two constraint languages. Examples of DLs are DL- 
Lite [CDGL+05], a lightweight DL often used in the context of ontology-based data access, 
and ACCQZb [TobOl], a more expressive DL that can make full use of number restrictions, a 
useful feature in practice. Both DL-Lite and ACCQZb can assert concept inclusions like C C C, 
where C and C' are concepts (arity 1 relations), meaning that C' holds whenever C does; and 
functionality assertions funct(R), where R is a role (an arity 2 relation), corresponding to 
Vx 3- 1 y R(x,y ) in GC 2 , or to the FD: Vxix 2 yiy 2 R(x i,x 2 ) A i?(yi,y 2 ) A x\ = y\ —>■ x 2 = y 2 . 
Despite its expressiveness, ACCQfLb can still, as DL-Lite, be captured by GC 2 , which implies 
decidable QA. 

Roles and concepts can be atomic (i.e., from cr< 2 ) or defined using constructors; we give 
some examples from ACCQfLb. The inverse R~ of an atomic role R is such that R~(b, a) holds 
whenever R(a, b ) does. An intersection of roles, which is written R\ FI • • • n R n , holds for (a, b) 
whenever Rfia,b) holds for all 1 < i < n. T and _L are the true and false concepts. The 
intersection of concepts C\, ,.., C n , written C\ FI • • • FI C n , holds whenever each of the C; does. 
The negation -> C of a concept C holds for elements where C does not hold. An existential 
concept 3R.C for a role R and concept C holds for every element a such that 3b R(a, b) /\C(b) 
does. Note that many of these features (e.g., functionality assertions and negation) cannot be 
expressed as existential rules. 

Combining constraint classes. For any class CL of existential rules, we call CL non-destructive 
(of arity-two QA) if QA is decidable for the class CL A GC 2 of conjunctions of constraints of CL 
(on a) and of constraints of GC 2 (on cr< 2 ). Otherwise, we call CL destructive. 

3. Negative Results for Combination 

We now present classes of existential rules which have decidable QA but are destructive. First, 
we observe that even the simplest class of rules that ensures decidability based on chase ter¬ 
mination, the class S2T of source-to-target TGDs, is destructive. This is not so surprising, 
since the arbitrary constraints on the arity-two signature may add dependencies that are not 
source-to-target. 

Theorem 3.1. S2T is destructive of arity-two QA, even when the whole a has arity two 
and there is no query (i.e., this is just the satisfiability problem asking whether the fact and 
constraints are satisfiable). 

Thus we move on to classes of existential rules that are decidable because of guardedness 
assumptions. 

We first observe that the class ID[2] of frontier-two inclusion dependencies is destructive 
of arity-two QA. In fact, functionality assertions on the binary relations are sufficient to get 
undecidability, because they can be lifted to functionality assertions on higher-arity relations 
using ID[2]. Thus, following a standard reduction from QA to entailment of dependencies as 
in [CLR03a], we can use the undecidability of entailment for ID[2] and FDs (Theorem 2 of 
[Mit83], which we adapt slightly) and prove the following: 
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Theorem 3.2. ID[2] is destructive of arity-two QA. In particular, QA is undecidable for 
ID[2] A V, for any DL V (such as DL-Lite) featuring functionality assertions. 

More surprisingly, frontier-one rules FR[1] are destructive of arity-two QA, even though 
they can only export a single variable, and this holds even when the whole a has arity two. 
The reason is that FR[1] may be more expressive than GC 2 as it can disobey the two-variable 
restriction. 

Theorem 3.3. FR[1] is destructive of arity-two QA, even when the whole a has arity two and 
there is no query. 

This motivates the search for more restricted existential rule classes which could be non¬ 
destructive of arity-two QA. 

4. From Existential Rules to Arity-Two 

We will focus on the subclass of frontier-one rules whose heads do not contain non-trivial Berge 
cycles [Fag83]. 

Definition 4.1. A Berge cycle in a conjunction of atoms 'L is a sequence Ai, x\, A 2 , X 2 , ■ ■ ■, 
A n ,x n of length n > 1 where the x t are pairwise distinct variables, the Ai are pairwise distinct 
atoms ofV, and every x t occurs in atoms A* and Ai + \ (with addition modulo n, so x n occurs 
in A\). 

We say 'F is non-looping if there is no Berge cycle of length above 2, and no Berge cycle 
that contains an atom of o> 2 . 

We define the head-non-looping FR[l] Hnl subclass o/FR[l] rules whose heads are non-looping. 
In particular, single-head FR[1] rules are always head-non-looping. 

Example 4.2. Rules A{x) -A 3yz R(x,y), S(y, z),T(z,x) and B(y) -A 3yz R(x,y),U(x,y, z) 
are not in FR[l]Hnl. 

However, A(x) —>• 3y V(x,x,y,y) and B(x) —> 3y R(x,y), S(x,y), R(y,x) 

are in FR[l]Hnl. 

We claim that head-non-looping rules are non-destructive, in contrast with general frontier- 
one rules (Theorem 3.3): 

Theorem 4.3. FR[l] Hnl is not destructive of arity-two QA. 

Of course, this means that QA is decidable for FR[l] Hnl A T>, for any DL T> expressible in 
GC 2 , such as ACCQfLb. The rest of this section proves the theorem and addresses complexity. 

Shredding. Our proof of Theorem 4.3 translates the FR[l] Hnl rules to arity-two constraints, 
using a common way to represent general relational databases in a binary relational store, which 
we call shredding: we represent an n-ary relation by a set of binary relations giving the link 
from each tuple (materialized as an element) to its attributes. We present first the translation 
of the signature a to its shredded arity-two signature crs, and the constraints imposed on 
<7S-interpretations to ensure that they can be decoded back to a- interpretations. Second, we 
explain how to shred facts and CQs. 
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Definition 4.4. The shredded signature <rg of a signature a consists of a< 2 , a unary relation 
Elt, and, for each R £ a > 2 , a unary relation Ar and binary relations R{ for 1 < i < |i?|. 

The well-formedness constraints of a g, written wf(erg), are the following DL constraints (they 
are ACC QZb-expressible): 

• C E Elt for every unary relation C of <r< 2 

• 3R. T E Elt and 3R~. T E Elt for all binary R of a< 2 
and the following, where R A S are in a >2 and 1 < i < |i?|: 

• 3Ri. T E Ar and 3R~. T E Elt 

• Elt n Ar E 1_ and Ar n A$ E -L 

• Ar E 3Ri. T and funct(i?j) 

The shredding SHR(F) of a a-fact F is the ug-fact obtained by adding the atom Elt(x) for 
each variable x of F and replacing each atom i?(x) of F when R £ a > 2 by the atoms ArIC) and 
Ri(t,Xi ) for 1 < i < |i?|, for t a fresh variable. The shredding SHR(g) of a CQ q is similarly 
defined. 

Example 4.5. Considering CQ q : 3xyz U(x), R(x, y ), S(z, z, x), we define SHR(q) as: 3xyzt Elt (x), Elt (y), 
Elt(z), U(x),R(x, y),A R (t), Si(t, z), S 2 (t, z ), S 3 (t, x). 

Fully-non-looping. The interesting part is to define the shredding of FR[l] Hnl rules. We first 
restrict to the class of fully-non-looping rules, FR[l] Fnl , whose head and body are non-looping. 

We show that FR[l] Fnl can be directly shredded to GC 2 . We will later move from FR[l] Fnl to 
FR[l]Hm. 

For any existential rule r : Vx </>(x) —>• 3y ^(x',y) with x C x', we define its shredding 
SHR(r) as the existential rule Vxt (SHR(</>(x))) —> 3yt' (SHR(^(x / , y))), where t and t' are 
the fresh elements introduced in the shredding of cf> and if respectively. We claim the following: 

Lemma 4.6. For any FR [1] Fnl rule r, SHR(r) can be translated in PTIME to a GC 2 sentence 
on <rg. 

Example 4.7. For brevity, this example ignores the Elt and Ar atoms when shredding. Con¬ 
sider the FR[l] Fnl rule: 

U(u),T(u,x),S(x) 3yz T(x,y),U(y),R(x,x,z,z) 

Its shredding is expressible in GC 2 (and even in ACC QZb): 

(3T-.U) n S C ( 3T.U) n (3(Rf n Rf).(3{R 3 n R4).T)) 

By contrast, consider the following rule in FR[l]\FR[l]Hnl; 

U{x) 3yz R(x, y), S(x, y, z) 

Its shredding is as follows; it is not GC 2 -expressible: 

U{x) -s- 3yzt R(x,y), Si(t,x), S 2 (t,y), S 3 (t, z) 

In the general case, the GC 2 rewriting of Lemma 4.6 is obtained in PTIME by seeing the 
body and head of SHR(r) as a tree, which is possible because r is fully-non-looping. 

It is now easy to show the following general result: 

Proposition 4.8 (Shredding). For any fact F, GC 2 constraints £, existential rules A and CQ 
q, the following are equivalent: 

• F A E A A 1= q; 
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• SHR(F) ASA SHR(A) A wf(<r s ) N SHR(g). 

Thus, from Lemma 4.6, as SHR(F), SHR(A), erg, wf(crg), and SHR((/) can be computed in 
PTIME following their definition, we deduce the following, in the case of FR[l] Fnl : 

Corollary 4.9. QA for GC 2 and FR[l] Fnl constraints can be reduced to QA for GC 2 in PTIME; 
further, when the constraints and query are fixed in the input, they also are in the output, so 
data complexity bounds for GC 2 QA are preserved. 

This concludes the proof of Theorem 4.3 for FR[l] Fnl constraints. It further implies that QA 
for GC 2 and FR[l] Fnl has co-NP-complete data complexity, like GC 2 , [PH09], and the combined 
complexity is the same as for GC 2 . 

Note that, although QA for GC 2 is decidable, we know of no realistic implementations. Our 
translation could however reduce instead to arity-two QA with constraints in DLs such as 
ACCQfLb, if we impose impose additional minor restrictions on the FR[l] Fnl rules (e.g., no 
atom of the form S(x,x)). For simplicity, however, we focus in the sequel on reductions to 
decidable QA on arity-two (i.e., translating to GC 2 ) rather than investigating which restrictions 
would ensure that the output of our translations can be expressed in particular DLs. 

Flead-non-looping. We now extend the claim to FR[ 1 ] Hnl rather than FR[l] Fnl . The idea is 
that we rewrite FR[l] Hnl rules to FR[1] Fnl by treeifying them , considering all possible fully-non¬ 
looping rules that they imply, and all possible ways that they can match on the parts of the 
interpretations that satisfy the fact. To formalize this, we assume that we have added to the 
fact F one atom P x (x) for each variable x of F, where each P x is a fresh unary relation. We 
then define: 

Definition 4.10. The treeification on fact F of a FR[l]Hnl 

rule t : Vx (</>(x) —> Ely ip(x f,y)), 
where Xf £ x is the frontier variable, is the conjunction TRp(r) of FR[l] Fnl rules defined as 
follows: 

• consider every mapping f from x to itself, and let f(r ) be obtained from r by renaming 
all variables in x with f; 

• for every such f{r), consider every x' C x and every mapping g from x' to the variables 
of F, and construct g{f(j)) by replacing every occurrence of each x € x' in f(x) by fresh 
variables x\,... ,x n , and adding the facts P g ( x )(xi ) for all x € x' and all i (if Xf € x', 
also replace X{ in ip(x f,y) by one of its copies); 

• tf 9{fi T )) is fully-non-looping, add it to TR^r). 

Example 4.11. Consider a fact F and the following rule t: 

R(x,y),S(y,z),T(z,w),U(w,x ) —* A{x) 

The treeification TRi?(r) contains the rule: 

R(x,y),S(y,z),T(z,y),U(y,x) A(x). 

Consider the rule t' : R(x,y), S(y,x,x) —> A(x), and a fact F containing variable z. Then 
TRf(t') contains: 

R(x 1 ,y),S(y, x 2 ,x 3 ),P z (x i), P z (x 2 ),P z (x 3 ) —> A(xi) 

We now claim: 

Proposition 4.12. For any fact F, GC 2 constraints E, FR[l] Hnl rules A and CQ q, the 
following are equivalent: 



• F AS A A 1= q; 

• F AS A TRp(A) |= q. 

This proposition implies that QA for FR[l] Hnl and GC 2 can be reduced to QA for FR[l] Fnl 
and GC 2 , which is decidable by the Shredding Proposition, proving Theorem 4.3. 

To prove Proposition 4.12, for the first direction, if F A £ A A \f=- q. one can show that all 
of the fresh unary relations P x in an interpretation of F A £ A A A —>q can be assumed to be 
interpreted by one tuple. One then shows that A implies TRp(A) on such interpretations. For 
the other direction, assuming that FAS A TRp(A) [A q, the Shredding Proposition implies 
that there is a us-interpretation J of 0 := S A SHR(TRp(A)) Awf(<rg), -> q' ■= -^SHR(g), and 
the existential closure of F' := SHR(i ? ). We apply an unraveling argument to show that J 
can be made cycle-free: 

Definition 4.13. The Gaifman graph Q(F) of an interpretation X is the undirected graph 
on dorn(Z) connecting any two elements co-occurring in a tuple of X. Given a fact F, an 
interpretation X is cycle-free except for F if F has a witness W in X such that any cycle 
of Q(I) is only on elements o/dom(VV). 

Lemma 4.14 (Unraveling). For any as-fact F', GC 2 constraints O, and CQ q', if( 3xt F'(x, t))A 
0 A -i q' is satisfiable then it has an interpretation which is cycle-free except for F'. 

Letting J' be the unraveling of our interpretation J (obtained by the Unraveling Lemma), 
we can then “unshred” J' back to a ^-interpretation X: 

Definition 4.15. The unshredding X of a a^-interpretation J |= wf(crs) is obtained by setting 
R 1 := R3 for R £ <r< 2 , and, for all R £ a > 2 and t £ creating the tuple a £ R 1 such that 
{t,af) £ Rf for all 1 <i < |i2|. 

As in the proof of the Shredding Proposition, we can show that the unshredding X is well- 
defined and satisfies the unshredded constraints (3x F(x)) ASA TR^(A) A -1 q. Further, we 
show that it satisfies A and not just TRjr(A), because a match of a FR[l] Hnl rule r in X must 
be a match of TRp(r); otherwise the match would witness that J' was not cycle-free: 

Lemma 4.16 (Soundness). For a a-fact F, FR[l] Hnl rule r and as-interpretation J, if J 
satisfies SHR(TRjr(r)) and is cycle-free except for SHR(F), then the unshredding X of J 
satisfies r. 

We conclude by sketching the proof of the Unraveling Lemma, which follows [Kaz04, PH09]. 
From an interpretation J of (3xt F\x, t)) A 0 A ~<q', for all u ^ v in dom(jF) co-occurring in 
some tuple of J , we call a bag the interpretation with domain { u , v} consisting of the tuples of 
J mentioning only u , v. We build a graph G over the bags by connecting bags whose domain 
shares one element. We pick a witness W of F' in J and merge in the fact bag all bags whose 
domain is included in dom(W). 

An unraveling is a tree T of bags obtained by unfolding G starting at the fact bag, which 
is preserved as-is. Each bag b of T except the fact bag has a domain containing two elements: 
one of them occurs exactly in b, its siblings and its parent; the other occurs exactly in b and its 
children (it is introduced in b). We see T as an interpretation formed of the union of its bags. 

We construct T from G inductively. For any bag b in T corresponding to a bag b' in G, 
construct the children of b as follows. For each bag b" adjacent to b' in G, if b' and b" share 
the element corresponding to the element u introduced in b, create an isomorphic copy of b" as 
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a child of b in T, whose domain is u plus a fresh element, and perform the unraveling process 
recursively on the children. 

It can be shown that the unraveling operation preserves GC 2 constraints, the fact F', and 
the negated CQ ~^q'. As T is a tree, the interpretation it describes is cycle-free (except for the 
witness VV, because we copied the fact bag as-is). 

Complexity. Proposition 4.12 gives a reduction from FR[1] Hnl and GC 2 QA to FR[l] Fnl to GC 2 
QA, but its output is of exponential size in the input, because of treeification. Hence, letting 
f(n) bound the size of the output of our reduction given an input of size n, and letting g(ro) 
bound the combined complexity of GC 2 QA, we have shown an upper bound of g(f(n)) for QA 
for FR[l] Hnl and GC 2 . 

Further, treeification rewrites the rules in a fact-dependent way, so, unlike the previous case 
of FR[l] Fnl and GC 2 QA, data complexity bounds for GC 2 QA do not imply data complexity 
bounds for FR[l] Hnl and GC 2 QA. 

5. Adding Functional Dependencies 

The previous section showed that the language of head-non-looping frontier-one rules is not 
destructive of GC 2 QA. However, another kind of rules that we would want to support on 
higher-arity relations are functional dependencies (FDs). 

It is well-known that QA is undecidable for, e.g., ID[2] and arbitrary FDs [CLR03a], so such 
constraints are trivially destructive. As it turns out, undecidability also holds for FR[l] Hnl 
rules and FDs; in fact, even for single-head FR[1] rules and FDs: 

Theorem 5.1. QA is undecidable for FDs and single-head frontier-one rides, even if all FDs 
have a determiner of size 1. 

However, for certain kinds of existential rules and FDs, QA is known to be decidable: this 
is in particular the case of non-conflicting rules and FDs [CGP12]: 

Definition 5.2. We say that a single-head existential ride r is non-conflicting with respect to 
a set of FDs <F if, letting A = R( z) be the head atom of t, letting S be the subset of {1,... , \R\} 
such that Zi is a frontier variable iff i G S: 

• No strict subset of S is the determiner of an FD in T; 

• If S is exactly the determiner of an FD o/<F, then all existentially quantified variables 
in A occur only once. 

Note that this requires rules to be single-head , and thus head-non-looping. Our result with 
respect to adding FDs is: 

Theorem 5.3. Non-conflicting frontier-one rules and FDs are non-destructive of arity-two 
QA. 

In particular, single-head frontier-one rules and FDs are non-destructive of arity-two QA if 
all variables in the head atom of rules are assumed to have only one occurrence, as this simple 
sufficient condition implies the non-conflicting condition. 

To prove the theorem, we assume without loss of generality that we only have FDs on higher- 
arity relations, as we can write them in GC 2 otherwise. We cannot shred the FDs, as they 
would translate to a functionality assertion for the path, e.g., R~ o Rj, which is not expressible 
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in GC 2 (and not even in expressive DLs such as S1ZOXQ [HKS06]). However, we can show that, 
thanks to the non-conflicting requirement, FDs can always be made to hold on interpretations, 
as long as they hold on a witness of the fact. 

Proposition 5.4. For any GC 2 constraints S, non-conflicting frontier-one rules A, FDs $ 
on <7> 2 , (7-fact F, and CQ q, if there is an interpretation X satisfying 0 := (3x F(x))ASaAA-k/ 
and there is a witness W of F in I satisfying then 0 A & is satisfiable. 

We first prove Proposition 5.4. As in Section 4, consider the treeification TRp(A): it is 
still non-conflicting as treeification only affects rule bodies. Use the Shredding Proposition to 
obtain an interpretation J of -> q' := -^SHR(g), 0 := S A SHR(TRp(A)) A wf(as), and the 
existential closure of F' := SHR(P). By our hypothesis about the existence of a witness, we 
can assume that J has a witness W of F' whose unshredding satisfies <J>. 

In the previous section, we used the Unraveling Lemma to show that J could be assumed to 
be cycle-free. We now modify the lemma to additionally ensure the following property on J, 
which will forbid FD violations in its unshredding: 

Definition 5.5. Given a set of FDs 4> on a > 2 , a a^-interpretation J, and a witness W of a 
fact in J , we call J FD-safe except for W if for every a £ dom(fl), for any R £ a > 2 and FD 
determiner P of R in 4>, considering each t £ dom( l 7) such that (t,a) £ R.f for every i £ P, 
either there is at most one such t or all are in dorn(W). 

FD-safety is useful for the following reason: 

Lemma 5.6. For any set of FDs <J> on a > 2 , for any as-interpretation J which is cycle-free 
and FD-safe except for a witness W, if the unshredding ofW satisfies <3?, then the unshredding 
of J satisfies 4>. 

We now claim a variant of the Unraveling Lemma: 

Lemma 5.7 (FD-aware unraveling). Let £ be a GC 2 constraint, F a cr-fact, q a CQ, A non¬ 
conflicting frontier-one rules and 4? a set of FDs on a > 2 . Let J be an interpretation satisfying 
0 := (3xt SHR(F)(x, t)) AS ASHR(TRi^(A)) Awf(irs) A-iSHR(( 7 ), andW a witness ofSWR.(F) 
in J. Then there is an interpretation J' satisfying © such that W is a witness of SHR(F) 
in J', and J' is cycle-free and FD-safe except for W. 

We prove the lemma by tweaking the unraveling process to ensure FD-safety: when creating 
children of each bag b in the unraveling T for neighbors of its corresponding bag b' in the 
bag graph G, omit some neighbors that contain shreddings of higher-arity tuples if the shared 
element u occurs in a strict superset of an FD determiner of 4>, and unravel differently the 
neighbors where u occurs exactly at a determiner. This unraveling still satisfies S, -1 q', and 
the existential closure of F’, and satisfies SHR(TR.p(A)): the non-conflicting condition ensures 
that the omitted facts were not required by a rule. 

We then apply the FD-aware Unraveling Lemma to J and consider the unshredding X of 
the result; it satisfies all necessary constraints as in Section 4, including 4> by Lemma 5.6. This 
proves Proposition 5.4. 

We conclude by proving Theorem 5.3. We first observe that the results of Section 4 extend 
to a more general notion of fact that allows inequality axioms (x y); indeed, inequalities 
in the fact are preserved by shredding and unshredding, and by unraveling. So Theorem 4.3 
holds for such facts with inequalities, with the same complexity. Second, we enumerate all 
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possible equalities between variables of the fact F, and for each possibility, consider the fact F= 
where variables are merged following the equalities, and inequalities are asserted between the 
remaining variables. Proposition 5.4 implies that our original entailment holds iff all the derived 
entailments hold where F is replaced by some F= whose canonical interpretation satisfies $ 
(this can be tested in PTIME for each F=). Thus we have reduced to QA for FR[l] Fnl and GC 2 . 

In terms of complexity, as GC 2 QA is EXPTIME-hard in combined complexity (because 
satisfiability for the usual two-variable guarded fragment is EXPTIME-hard [Gra99]), the 
additional exponential factor (from all possible F=) has no impact, so the bounds of Section 4 
also apply to QA for GC 2 and non-conflicting frontier-one rules and FDs. 

6. Conclusion 

In this paper, we have studied the impact of existential rules on the decidability of query 
answering for classes of arity-two constraints. We also explained (in proving Theorem 5.3) how 
the decidability extends when inequalities are allowed in facts. 

We have limited our arbitrary arity constraints to rules, i.e., dependencies. In future work 
we will study how to extend our results to arbitrary arity constraint languages with more 
features, e.g., disjunction. We will also study what happens in the presence of constants (or 
nominals), which are disallowed in GC 2 (and in the rule languages we consider), but are known 
not to break decidability in arity-two contexts [RG10, CEO09]. This, however, would probably 
require different techniques, as unraveling may create multiple copies of constants. Another 
question that would probably require specific tools is the study of finite QA, i.e., QA restricted 
to finite interpretations. 
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A. Proofs for Section 3: Negative Results for Combination 

A.l. Proof of Theorem 3.1: S2T is destructive 

Adapt the proof of Theorem 3.3 by rewriting r to replace all S-atoms in the right-hand side by 
S'-atoms. The resulting rule is clearly source-to-target, with as = {S'} and ax = {D,R,S'}. 
Now impose the concept inclusion S' C S. It is clear that the resulting rules are equivalent to 
those of Theorem 3.3, so the same proof applies. 

A.2. Proof of Theorem 3.2: ID [2] is destructive 

In this section, as in the rest of the appendix, we write the positions of any relation R as 
R\...,R\ r K 

We will show this undecidability result by considering the entailment problem. 

Definition A.l. The (unrestricted) entailment problem for two classes CLi, Cl _2 of constraints, 
asks, given a set of constraints E of CLi and a constraint r € Cl_ 2 , whether E entails r, written 
E |= r. That is, whether any interpretation of E is an interpretation of r. 
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We show a reduction to QA for a class of logical constraints to entailment to this class of con¬ 
straints and rules. The idea follows [CLR03b] (Theorem 3.4) but is slightly more complicated 
to take care of a difficulty that was omitted there. 

Lemma A.2. For any class CLi of constraints and Cl _2 of existential rules, there is a reduction 
from entailment for CLi and CL 2 to QA for CLi. 

Proof. Consider an instance of the entailment problem for CLi and CL 2 : £ is a set of constraints 
of CLi, and r : Vx (</>(x) —» 3y iffx', y)) with x' C x is an existential rule of CL 2 . Let us reduce 
this to an instance of the QA problem for CLi. 

Create fresh unary relations P x for each x G x. We consider the QA instance asking whether 
F A £ |= q, where the fact F is </>(x) A f\ xGx P x (x) and the query q is 3xy </>(x) A if(x ', y) A 
Azex Ac (x). We claim that F A £ |= q iff £ |= r, which proves that the reduction is correct. 

If £ |= r, then consider an interpretation I satisfying £ and the existential closure of F. As 
X |= £ and £ |= r, we have X |= r; thus, applying r to any witness of F in X, we deduce the 
existence of a match of q. This proves that F A £ |= q. 

Conversely, if £ y= r, there exists an interpretation X of £ that does not satisfy r, meaning 
that there is a violation of r in X: a set b of elements of dom(X) such that X |= </>(b) but this 
match cannot be extended to a match of if. Let us modify X to V by setting, for each x G x, 
P x ' ■= {(&)}, where b is the element of b corresponding to x G x, and setting R 1 ' := R x for 
all other relations R. It is clear that I' still satisfies £, as £ does not mention the fresh unary 
relations P x . Now, we also have X' |= </>(b), and by construction X' (= /\ beb P x (h), so that X' 
satisfies the existential closure of F. However, X' does not satisfy q: the only possible match 
of q is on the elements that occur in the P x , and the impossibility to extend this match to a 
match of if is by definition of it being a violation of r. Hence, X' witnesses that FA£^r, 

Hence, the reduction is correct, which concludes the proof. □ 

Thus, let T> be a DL that can express the assertions funct(-R) for any binary relation R. To 
show the undecidability of QA for V A ID[2], by the above, it suffices to show the undecidability 
of entailment for V A ID [2] and ID [2]. 

Definition A.3. We call UFD the class of unary functional dependencies (UFDs), that is, 
functional dependencies (on arbitrary arity relations) whose determiner consist of a single 
attribute. We write UFDs as R p —> R q , where RP and R q are positions of a higher-arity 
relation R. 

We now claim that functionality assertions on binary relations can be bootstrapped to UFDs 
on arbitrary arity relations, using ID[2]: 

Lemma A.4. There is a reduction from entailment for UFD A ID [2] and ID[2] to entailment 
for V A ID[2] and ID[2], 

Proof. Consider constraints £ of UFD A ID[2] and a rule r G ID[2]. Encode each UFD cf : R p —> 
R q of £ as an ID[2] rule : Vx i?(x) —> R^(x p ,x q ), where is a fresh binary relation for 
(f, and a functionality assertion funct(i?<Q. Let the constraints £' consist of the original ID[2] 
rules, the new ID[2] rules, and the functionality assertions. We claim that £ |= r iff £ r (= t. 

If £ / y=- r, let X be a counterexample interpretation satisfying Yf but not r. We claim that 
X also satisfies £. Indeed, the only thing to check is that UFDs are satisfied; but assume that 
there is a UFD cf : R p —>• R q of £ that has a violation in X, namely, two tuples a, b G R x 
such that a p = b p but a q / b q . As X satisfies the ID[2] rule r^, we have (a p ,a q ) G R x and 
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( b p ,b q ) G R X ] this contradicts the assertion funct(i? ? >) that X is supposed to respect. Hence X 
satisfies X, and as X does not satisfy r, it witnesses that X \f= r. 

Conversely, if X r, let X be a counterexample interpretation satisfying X but not r. 
Without loss of generality, we have R x = 0 for all the fresh relations as they are not 
mentioned in X. Now, extend X to an interpretation X' that satisfies X' by adding to R x , for 
every FD f> : R p — >• i? 9 , for every a € R x , the tuple (a p , a q ). It is clear that the result X' still 
satisfies the ID [2] rules of X, and that it satisfies the ID [2] rules of X ; ; and it is easily seen that 
it satisfies the functionality assertions as otherwise, as before, a violation of such an assertion 
in X' witnesses a violation of the UFDs of X in X. Further, as r does not mention the R^, X' 
still does not satisfy r, because X did not. Hence, X' witnesses that X' \f= r. 

This shows that the reduction is correct, concluding the proof. □ 

Definition A.5. The class of frontier-one inclusion dependencies (or unary inclusion depen¬ 
dencies), ID[1], is the class of inclusion dependencies with frontier of size 1. We write an ID[1] 
rule Vx (-R(x) —> Ely S'(x / , y)) as R p C S q , where RP and S q are the positions at which the 
frontier variable occurs in the body and head atom respectively. 

Following this convention, we write rules of ID [2] in the same way: R a R b C S c S d denotes 
the rule Vx (-R(x) —> Ey 5(x',y)) where the first frontier variable occurs at positions R a and 
S c in the body and head, and the second occurs at positions R b and S d in the body and head. 
(Remember that the definition of ID requires each variable to only occur once in the body atom 
and head atom.) Note that we must have R a / R h and S c S d ; but we may have R a = S c or 
R a = S d , and similarly for R b . 

We now explain that we can add without loss of generality frontier-one inclusion dependen¬ 
cies (or unary inclusion dependencies), ID[1], to the entailment problem, the reason being that 
ID[1] rules can be encoded in ID[2] up to adding additional attributes. 

Lemma A. 6. There is a reduction from entailment for UFD A ID[1] A ID[2] and ID [2] to entail¬ 
ment for UFD A ID[2] and ID [2]. 

Proof. Consider constraints X of UFD A ID [1] A ID[2] and t G ID[2] . Let c _f_ be the signature 
obtained from a in the following way: for each relation i? G <7, we create a relation R + in a + 
whose positions are those of R plus one position R + 1 for each ID [1] rule <5 of the form RP C S q , 
and one position R+ 2 for each ID [1] rule 6 of the form S q C R p . 

Now, encode each ID[1] rule 6 : R p C S q of X as the following ID [2] rule on a + : RP + R & f l C 
S+S+ 2 . We thus define the constraints X' on <r + to consist of these additional ID[2] rules, and 
of the straightforward rewriting of the original ID [2] and UFD constraints of a to cr+, rewriting, 
e.g., R a R b C S c S d as Rf R b + C 5^.5+, and R p —> R q as R p + —> S+. Once again we show that 
X |= r iff X' |= r. 

If X t, then we extend a counterexample cr-interpretation X to a cr + -interpretation X' 
satisfying X' as follows: for all R G cr, consider each tuple a G R x , and create in the 
tuple b defined by b v ■= a p for all positions RP of R , and bs^i ■= a p such that 6 is of the form 
RP C S q , and bs ,2 '■= cl p such that 5 is of the form S q C RP . It is clear that the result X' 
still satisfies the UFD and ID [2] constraints of X' and violates r, because they do not mention 
the new attributes of a + . Further, X' clearly satisfies the new ID [2] rules because the original 
interpretation X satisfied the ID [1] rules. Hence, X' witnesses that X' \f=- r. 

Conversely, if X' ^ r, we rewrite a counterexample <r + -interpretation to a cr-interpretation 
of X by simply removing the additional attributes in all tuples, which clearly gives an inter- 
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pretation satisfying S: it satisfies the ID[1] rules because X' satisfied the new ID[2] rules of E, 
and the other constraints are preserved. This concludes the proof. □ 

We are now ready to conclude, because: 

Theorem A.7 ([Mit83]). The entailment problem for UFD A ID[1] A ID[2] and ID[2] is unde- 
cidable. 

This is a slightly stronger result than what is claimed in [Mit83], because their definition of 
ID [2] does not forbid repetitions of positions (i.e., it allows ID [2] rules of the form R p R q C R r R r ). 
We refer to Appendix C.l for more details about how the stronger result is proved. 

This concludes the proof of Theorem 3.2, because, if QA for ID[2] /\T> were decidable, then we 
would have decidability of the entailment problem above, by reducing it successively through 
Lemma A.6, Lemma A.4, and Lemma A.2. 

A.3. Proof of Theorem 3.3: FR[1] is destructive 

Formally, we define the satisfiability problem of a fact F and constraints 0 as checking whether 
there is an interpretation of 0 and of the existential closure of F. We will show that the 
satisfiability problem is undecidable, not for FR[1] A GC 2 , but for the weaker FR[1] A ACCF. 
The DL ACCF is GC 2 -expressible; in addition to the constructors of Section 2, it also allows 
disjunction of concepts: C\ U • • • U C n . 

We use tiling systems, following the notations of [PH09]. Let T = (C ,H,V) be a tiling 
system where C = C\,, Cn is a non-empty finite set of tiles and H,V C C 2 are binary 
relations (intuitively standing for “horizontal” and “vertical”). 

Given a sequence c = co, ci,..., c n , the infinite tiling problem for c is to determine whether 
there exists an infinite tiling , that is, a function / : N 2 —> C such that f(i, 0) = q for 0 < i < n 
and for all i, j € N, (/(i, j), f(i + 1, j)) € H and (f(i,j), f(i,j + 1)) € V. It is known that we 
can choose a fixed T such that the infinite tiling problem that has c as input is undecidable. 
Hence, fix such a T in what follows. 

We consider the (single) FR[1] rule: 

r : Vit (S(u) -A- 3xyz R(u, x ) A D{u , y) A R(y, z ) A D(x, z) A S(x) A S(y) A S(z)) 

We impose the functionality restrictions funct(-R) and funct(-D). Intuitively, R stands for 
“right” and D for “down”. 

We create one concept C( for each tile in C. We impose the disjointness assertions CjnCj C _L 
for all i j. 

We impose the concept inclusions S C C\ U • • • U Cn- 

We impose the concept inclusions C t C 3R Cfi U • • • U 3R Cj t where Cfi ,..., Cj t are all the 
tiles such that H = {(Ci,Cj k ) \ 1 < k < l}. Having done this for R and H, we do the same 
with D and V. 

We are now ready to conclude the reduction. We claim that the infinite tiling problem for 
T and the input c reduces to the satisfiability of the fact F c and the constraints that we have 
imposed, where we define: 

F c (x 0 ,...,x n ) := S(x o)A f\ Ccfixi) A f\ R(xi,x i+ i) 

0<i<n 0 ,<i<n 
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Let us prove that, indeed, F c and the constraints are satisfiable iff the infinite problem for 
T with input c has a solution. 

Assume that the infinite tiling problem for T and c has a solution /. Consider the interpre¬ 
tation X such that dorri(Z) = {a^j | i,j £ N}, defined as follows: 

S 1 := IKj) I h3 € N} 

R •= {(fliji tti-f-ljO | i'lj ^ 

D •= {(ctiji cii^ jf+i) | i)j £ N} 

cl := {(dij) I ije N, f(i,j) = k} for all 1 < k < N 

The interpretation I satisfies the rule r, the disjointness assertion, the concept inclusions 
(this uses the fact that / is a tiling for T), and the existential closure of F c , so the fact and 
constraints are satisfiable. 

Conversely, let I be an interpretation satisfying the constraints and the existential closure 
of F c . From the fact that X satisfies the existential closure of F c , as X satisfies r and the 
two functionality assertions, we can build from X an infinite grid of R and D edges whose top 
left corner is a match of variable xq of F c , such that all vertices are in S 1 . Let us index the 
elements of this grid as a*j. The constraints impose that each at : j carries exactly one tile, so 
we can define a function / : N 2 —> C that maps (i, j) to the one Ci such that a l)3 £ Cf holds. 
The constraints ensure that / is a valid tiling for T, so the infinite tiling problem for T and c 
has a solution. 

This concludes the proof that the reduction is correct, so from the undecidability of the tiling 
problem we deduce the undecidability of satisfiability for a fact, a FR[1] rule, and constraints 
in ACCF. This implies the claim of Theorem 3.3. 


B. Proofs for Section 4: From Existential Rules to Arity-Two 

B.l. Proof of Lemma 4.6: Shreddings of FR[l] i nl are in GC 2 

Definition B.l. Recall Definition 4-13: we call a as-interpretation J cycle-free if the Gaifman 
graph Q{J) of J is acyclic. 

We call a frontier-one existential rule on <7g cycle-free if the conjunctions of atoms of its 
head and body are cycle-free. 

We call a CQ q cycle-free if its Gaifman graph is acyclic, defining the Gaifman graph Q(q ) 
to have the variables of q as vertices and an edge between any pair of variables that co-occur 
in an atom of q. 

We first show the following: 

Lemma B.2. Cycle-free frontier-one existential rules on ag can be translated in PTIME to 
an equivalent GC 2 sentence. 

The above claim is clearly implied by the following: 

Lemma B.3. For any cycle-free CQ q(x) on erg with one free variable, q(x) can be translated 
in quadratic time to an equivalent GC 2 formula with one free variable. 
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Indeed, once Lemma B.3 is proven, we can show Lemma B.2 by writing the existential rule 

Vxfx (<j>(x f,x) ->• 3y(j>(xf,y)) 

as the following, in GC 2 : 

Vx f (0'(x f ) -5- i/)'(x {)) 

where ft and ft are the formulas obtained from f and fj. 

Let us then show Lemma B.3: 

Proof. We test in PTIME whether Q(q) is connected. If it isn’t, we can rewrite q{x) in PTIME 
as q'(x) A f\ i 3y g,(y) where g' and g,; are CQs whose conjunction of atoms is connected, and 
translate g(x) in PTIME by translating each of the g;. Hence, we assume without loss of 
generality that Q(q) is connected. 

We proceed by induction on |g|, the number of atoms of g. If |g| = 1 the result is trivial. 
Otherwise, let A be the set of atoms of g in which the free variable x occurs. Let X be the set 
of variables occurring in A except x. For any y £ X, let X y be the set of variables z different 
from x and y such that there exists a path from z to y in G(q) which does not go through 
the vertex x. Let A y be the set of the atoms of q which are not in A and contain a variable 
of {y} U Xy. All of these sets can be computed in linear time as the answers to reachability 
questions on G(q), and the number of sets is linear, so the computation takes at most quadratic 
time. 

We now claim that {x}, X, and the X y for y € X are a partition of the variables of x. Indeed, 
as G(q) is connected, any variable z different from x is either adjacent to x (and thus z € X), 
or there is a path from x to it, and the first variable of that path after x must be some y € X 
(so that z € X z ); this justifies that these sets cover the variables of y. Further, these sets are 
pairwise disjoint. Indeed, first, x <( X and x £ X y for all y € X by construction. Second, if 
there is a variable z G X fl X y for some y € X, we have y ^ z as z G X y , and considering the 
edges in G{q) between x and y, x and z, and the path from z to y that does not go through 
x, we have a cycle in G(q), a contradiction. Third, for y, y' € X, y ^ y', if X y and X y > are not 
disjoint, letting z G X y fl X y t, as x and y, x and y' are connected in G(q), and there is a path 
from z to y and z to y' in G{q) not going through x, we have a cycle in G(q), a contradiction. 
For similar reasons, A and the A y are a partition of the atoms of q. 

Now observe that, for any y £ X, A y is a conjunction of atoms with free variables y and 
X y , and G(Ay) is acyclic and connected because G(q) is. Because we have shown disjointness, 
we can apply the induction hypothesis to justify that 3X y A y (X y ,y) can be written in GC 2 as 
F y (y), in quadratic time in 3X y A y (X y , y). Hence, partitioning A as A! x (the atoms where only 
x occurs) and A' y for y € X (the atoms of A where variable y occurs, and the other variable is 
necessarily x), we can express q(x) as follows in GC 2 : 

A ^(‘ T ) A A \ 3z ( F v( z ) A A A ( x > z ) 

AeA' x y&x \ \ AeA' y 

Hence, the overall complexity of the rewriting is quadratic, as the induction hypothesis is 
applied to sets of atoms that are a partition of the atoms of the original input formula, so that 
the quadratic time spent rewriting each set of atoms is quadratic overall in the input formula. 
By induction, the proof is completed. □ 
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We then conclude the proof of Lemma 4.6 by observing that for any FR[l]Fnl rule r, SHR(r) 
is indeed a cycle-free frontier-one existential rule on 03 - Indeed, we show this for the head and 
body with the following lemma: 

Lemma B.4. For any non-looping conjunction of atoms <J>, SHR(<f>) is cycle-free. 

Proof. Any cycle in t?(SHR(4>)) clearly translates to a Berge cycle in that has length > 2 
or contains a higher-arity atom. In either case, this would contradict the fact that $ is non¬ 
looping. □ 

B.2. Proof of Proposition 4.8: QA through shredding 

We start by defining shreddings of interpretations: 

Definition B.5. For any a-interpretation I, the shredding SHR(X) of I is the as-interpretation 
J such that R? ■= R 1 for all R € cr< 2 , Elt^ = dorn(X), and for every R € <r> 2 , for each tuple 
a € R x , we create a fresh element t € dom( l 7), we add t to A^, and we add (t,af) to Rf for 
all 1 < i < |i?|. 

It is immediate that for any cr-interpretation X, its shredding J satisfies wf(<r), and that the 
unshredding of J (in the sense of Definition 4.15) is X. 

We first show the following lemma to show that negations of CQs, facts and existential rules 
are preserved by shredding. 

Lemma B.6. For every fact F, CQ q and set A of existential rules, for any interpretation X, 
X satisfies A, -1 q and the existential closure of F iff SHR(X) satisfies SHR(A), -iSHR^), and 
the existential closure o/SHR(X). 

To show this, we define the notion of homomorphism: 

Definition B.7. For any interpretations X and X', a mapping h : dom(X) —>• dom(X') is a 
homomorphism from. X to X' if for every relation R € a, for any tuple a € R 1 , the tuple 
h( a) = (h(a \),..., h(ai R i)) is in R x . 

This notion extends to homomorphisms from queries to interpretations in the usual manner. 

Lemma B.8. For any two interpretations X and X', any homomorphism from X to X' can be 
extended to a homomorphism from SHR(X) to SHR(X / ), and conversely any homomorphism 
from SHR(X) to SHR(X') can be restricted to a homomorphism from X to XL 
Proof. This is immediate, paying attention to the fact that a homomorphism h from X to I' 
defines a mapping from the tuples of X to the tuples of X 7 , which describes how to extend h to a 
homomorphism from SHR(X) to SHR^X') by defining the image of h on dom(SHR(X))\ dorn(X). 

Conversely, given a homomorphism from SHR(X) to SHR(X / ), its restriction to dorn(X) is 
easily seen to be a homomorphism from X to XL □ 

We now prove Lemma B.6. 

Proof. We prove each part of the claim: 

Query q. By Lemma B.8, there is a homomorphism from SHR(g) to SHR(X) iff there is a 
homomorphism from q to X. 

Fact F. Similar to the case of the query. 
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Rules A. Consider any existential rule r G A. 

Assume that X |= r. Consider a homomorphism h from the body of SHR(r) (which is the 
shredding of the body of r) to SHR(X), and show that the image of h is not a violation 
of r in SHR(X). By Lemma B.8, h can be restricted to a homomorphism h! from the 
body of t to X. Hence, because X |= r, /;/ can be extended to a homomorphism h" from 
the body and head of r to X. By Lemma B.8, h" can be extended to a homomorphism 
h!" from the shredding of the head and body of r to SHR(X) that matches h on the body 
of SHR(r). So we conclude that h does not witness a violation of SHR(r). 

Conversely, assume that SHR(X) |= SHR(r). Consider a homomorphism h from the 
body of r to X. As previously h can be extended to a homomorphism /;/ from the body 
of SHR(r) to SHR(X), which can be extended to a homomorphism h" from the body 
and head of SHR(r) to SHR(X). Again we use Lemma B.8 to justify that this defines a 
homomorphism h!" from the body and head of r to X that matches h on the body of r, 
and conclude that h does not witness a violation of r. □ 

Having proved Lemma B.6, we show the preservation of GC 2 constraints: 

Lemma B.9. For every interpretation X and GC 2 theory X, we have X |= X iff SHR(X) |= X. 

Proof. The restrictions I\ a<2 and SHR(X)| CT<2 of X and SHR(X) to a <2 are identical (remember 
that the Ri in <7g\cr are fresh so they do not occur in X), hence X and SHR(X) satisfy the same 
GC 2 constraints. □ 

We can now prove one direction of the result: if there is a counterexample interpretation of 
(3x X(x)) A X A A A ->g, its shredding is an interpretation of (3xt SHR(X)(x, t)) A SHR(A) A 
-■SHR(g) (Lemma B.6) that satisfies X (Lemma B.9) and wf(crg) (by our initial immediate 
observation about the shredding of interpretations). 

What remains is to prove the converse direction of decoding an interpretation J of 0 := 
(3xt SHR(X)(x, t)) A X A SHR(A) A wf (erg) A -iSHR(g). This is harder, because we must argue 
that J can be understood as the shredding of a ^-interpretation for the above results to apply. 
This requires us to deal with the issue of redundant tuples'. 

Definition B.10. A as-interpretation J is redundancy-free if there is no R G <r> 2 , no t f^t' 
in dom( l 7), and no \R\-tuple a such that (t, af) and {tfaf) belong to R.f for all 1 < i < |ii|. 

Redundant tuples are the only obstacle to that prevents us from understanding any inter¬ 
pretation of wf(cig) as the shredding of some cr-interpretation. Indeed: 

Lemma B.ll. SHR is a bijection from a-interpretations to redundancy-free as-interpretations 
satisfying wf(erg). 

Proof. This is clear, as, writing SHR -1 the unshredding operation of Definition 4.15, we have 
already observed that, for any cr-interpretation X, we have (SHR -1 o SHR)(X) = X. Further, 
given a redundancy-free erg-interpretation J satisfying wf(erg), it is immediate that (SHR o 
SHR -1 ) [J) = J. This concludes the proof. □ 

As redundancy-freeness cannot be expressed in GC 2 , our counterexample interpretation J 
may not satisfy it. But this does not matter. Recalling our definition of 0 above, we show: 

Lemma B.12. If 0 has an interpretation then it has a redundancy-free interpretation. 
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Proof. Let J be a os-interpretation of 0. 

Define the equivalence relation on dom( l 7) as follows: t t! if, for some R G ct> 2 , 
(t) and (t') are both in A and for every 1 < i < |i?|, (t,z) G R.f iff ( t',z ) G The 

conditions of wf(cs) ensure that this is an equivalence relation, because the Af, are pairwise 
disjoint. Define '■ dom( v jT) —)• domthe function mapping every element of dom(jL) 
to its ^^-equivalence class, and let J' be the image of J under x '■= • 

J' is redundancy-free as any t, t' G dom( l 7') witnessing redundancy in J' would have as 

preimage by x t wo elements of J that are '^-equivalent. (This uses the fact that, by wf(crs), 
two elements of Af and Af, cannot be adjacent in Q{ J) for any R, R' G a > 2 .) 

It is easily checked that J' is still an interpretation of wf(crs). As y is a homomorphism 
from J to J' , and J satisfies the existential closure of F, J' also satisfies it. Further, because 
the restrictions of J' and J to a <2 coincide, J' is still an interpretation of E. 

To show that J' still satisfies -iSHR(g), it suffices to show the existence of a homomorphism 
from J' to J. We build such a homomorphism h by setting, for all a G dom( t 7 , )> h(a) ■= a' 
for any preimage a' of a by x- To see why h is a homomorphism, consider any tuple t G R^ 

for some i? G <rg. Let t' G R? be a preimage of the tuple t by y. Clearly, by wf(erg), unless 

R is one of the fresh binary relations Ri, all elements of t' are singletons in their ~T_ c i asS; so 
that necessarily t/ = h(t ) and h(t) G R^. If R = write t = ( u,a ) and t! = ( u',a'). We 
then have that h(t) is a pair and necessarily a" = a' because by wf(ers) we know that 

(a') G Elt^ so a' a" implies a' = a". Now, as u' u" as u',u" G A and the Ar are 
pairwise disjoint by wf(cis), we have ( u',a') G R,f iff ( u",a") G R.f ■ so that indeed h{t) G R.f ■ 
Hence, h is indeed a homomorphism from J' to J. Thus J’ satisfies -iSHR(g) because J is. 

For any existential rule r, to show that J 1 still satisfies SHR(r), it suffices to observe that 
h o y is the identity, so that any match m of the body of r in SHR(r) gives such a match in J 
which, as J |= SHR(r), extends to a match of the body and head which is mapped back by / 
to a match of the body and head of SHR(r) in J’ , so that m does not witness a violation of 
SHR(r). Hence, J' still satisfies SHR(A). □ 

We can now complete the proof of Proposition 4.8 with the backwards direction: given our 
interpretation J of 0, make it redundancy-free by Lemma B.12, and now unshred it to an 
interpretation X' such that, by Lemma B.ll, SHR(X / ) = J. We conclude by Lemma B.6 and 
Lemma B.9 that X' satisfies A, E, —>q, and the existential closure of F. 

B.3. Proof of Lemma 4.14: Unraveling for GC 2 

We present the formal unraveling process. In all of this section, we work only on the signature 
os- 

Definition B.13. For any interpretation J , the induced interpretation J\ a of J by a C 
dom(jL) is the interpretation containing all the tuples of J where only elements of a occur. 
A guarded pair in J is a pair {a, b} of two distinct elements of dom (J) such that a and b 
co-occur in some tuple of J. The immediate neighborhood IN^a) of a G doiri(X) in J is 
{b G dom( l 7)\{a} | {a, 6} guarded pair in J}. 

The bags of an interpretation J are the interpretations induced by all guarded pairs of J. 
The bag graph of a <rg -interpretation J is the undirected graph on the bags of J (without 
self-loops) where two distinct bags are adjacent whenever their domains share one common 
element. (As the domains have size two, they must then share exactly one element.) 
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Given a witness W of a fact F in J, we alter the definition of the bag graph of J by adding 
one fact bag corresponding to the witness W; the fact bag is adjacent to all bags with which it 
shares one element (but not those with which it shares two elements). 

Definition B.14. A tree-like interpretation is a tree T = ( W,E,b v ) where each b G W is a 
bag (that is, an interpretation), b v G W is the root bag, and E is the directed edge relation. We 
require that for all ( b , b') G E, the domains of b and b' share exactly one element u such that 
u exactly occurs in T at the following places: in b (we say it was introduced in b), and in all 
children of b (including b'). Further, if two bags b and b' in W share some element then either 
they are siblings in T or one is a child of the other in T. We write dorn(T) = U fceM/ dom(&) 
and also see T as the interpretation [j beW b. 

Given a fact F and a witness W of F in J, we say that T is an unraveling of J preserving 
W ifb t is the fact bag of J, all other bags ofT have domain of size 2, and elements of dom( l 7o) 
only occur in b v (we say they were introduced in b v ). 

Our goal will be to construct an unraveling of the counterexample interpretation, because 
of the following: 

Lemma B.15. If T is an unraveling of an interpretation J preserving a witness W of a fact 
F, then T is an interpretation where W is also a witness of F and which is cycle-free except 
for W (recall Definition j.13). 

Proof. Except for dom(W), Q(T ) is a tree which matches T : if any two elements u, v of dom(T) 
are not both in dom(>V) and co-occur in a tuple of T, this edge of Q(T ) corresponds to the 
edge between the bag where u was introduced, and the bag where v was introduced. □ 

However, we also want the unraveling to be faithful, so the constraints are preserved. 

Definition B.16. T = (W,E,b r ) is a faithful unraveling of an interpretation J preserving W 
(where W is a witness of a fact F) if it is an unraveling of J preserving W such that there 
exists a homomorphism ir from T to J, and a mapping from W to the bags of J that maps 
b r to the fact bag, and maps no other bag to the fact bag. We require that: 

(Compat) cj) is compatible with n: for any b G W, vr| d om (6) an isomorphism between b and 
<f(b), and it is even the identity for b = b r ; 

(IN) for every a G dom(T), vT|| N t^ is an isomorphism between IN 7 (a) and ll\l^(7r(a)); 

(Surj) <f> is surjective except for W: for any bag b of J whose domain is not a subset of 
dom(W), b has a preimage by fi. 

We say an interpretation is unravelable if all elements of the interpretation occur in at least 
one tuple for a binary relation, and if its bag graph is connected; we can assume without loss 
of generality that interpretations are unravelable by adding tuples for a fresh binary relation 
to satisfy these conditions. We now claim: 

Proposition B.17. For any fact F, GC 2 constraints 0, and CQ q', if J is an unravelable 
interpretation that satisfies 0 and ~<q' and has a witness W of F, and T is a faithful unraveling 
of J preserving W, then T (seen as an interpretation) satisfies 0, -i q', and the existential 
closure of F (in fact W is still a witness of F in T). 
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Proof. It is clear that VV is still a witness of F in T. T also satisfies -i q', since there exists a 
homomorphism from T to J. so if T satisfied q' then so would J. 

We must show that T still satisfies 0. Up to expanding the original interpretation by 
interpreting new relation names, following [Kaz04] we can rewrite the GC 2 constraints 0 as 
a conjunction of a formula of GF 2 (the guarded fragment with two variables but no number 
restrictions) and number restrictions of the form V.x 3 txin y R(x,y ) where n € N, to € {>,<}, 
and R is a binary relation. 

The fact that the number restrictions are preserved is immediate, since they only depend on 
the immediate neighborhood of elements, which are isomorphically preserved by ir according 
to property (IN). 

We show that GF 2 is preserved by showing the existence of a guarded bisimulation from 
T to J [GHO02], We define the guarded bisimulation as the set X of all restrictions of 7r to 
singletons and guarded pairs of X, which are indeed partial isomorphisms from T to J. We 
show that the back and forth conditions are satisfied. For any / : X —> Y in X: 

Forth. Consider a guarded set Z of T. There is a partial isomorphism f in X with domain Z. 
and it agrees with /onZfll as they are both restrictions of 7 r 

Back. Consider a guarded set Z of J. As J is unravelable, all singletons of J occur in 
some guarded pair of J. so it suffices to consider the case where \Z\ = 2. Let b be the 
corresponding bag of J. We distinguish depending on whether Z does not intersect Y 
or whether it does: 

If | Z FI Y | = 0, either dom(6) C dom(VV) so we can find an isomorphism of X with domain 
tt~ 1 (Z) because 7r is the identity on dom(W), or as f> is surjective (property (Surj)) there 
exists b' G W such that 4>(b r ) = b and thus, because f> and 7r are compatible by property 
(Compat), the image of7T| dom ( 6 /) is Z, so there is a corresponding partial isomorphism in 
X. 

If \Z n Yj 7 ^ 0, the only non-trivial case is \Z fl Y\ = 1. Let a be the element of Z n Y. 
Because by property (IN) 7T|| N T( a ) is an isomorphism from IN r (a) to ll\F r (7r(a)), there 
exists a guarded pair X' of T such that t^{X') = Z\ hence, there is a partial isomorphism 
f in J from X' to Z , and it agrees with / as both are restrictions of 7 r. 

This concludes the proof. □ 

We must now show that a faithful unraveling exists: 

Proposition B.18. For any fact F, for any unravelable interpretation J and witness W of 
F in J , there is a faithful unraveling T of J preserving W. 

Proof. To build T, define the root b r of T as W, set <f(t T ) = W, initialize 7 r as the identity 
on VV, and define inductively T = (IT, E. t r ), the homomorphism 7 r and the mapping </>, as 
follows. At every bag b € W, consider the corresponding bag <f(b) of J. For every element a 
introduced in b (there is only one except for b = b r ), consider every bag b" in the bag graph 
of J that shares element 7r(a) with cj>(b) (so b" is adjacent to <f(b) in the bag graph). Letting 
dom(6") = {ir(a), a"}, create a bag b'" in T as a child of b , with domain {a, a'} where a' is fresh 
and where we set ir(a') := a ", and make b’" an isomorphic copy of b" following the mapping 7 r. 
Perform the same process inductively on all child bags. 

It is clear that the result of this process is indeed an unraveling of J. It is also clear that 7r 
thus defined is a homomorphism as any created tuple in T clearly has a homomorphic image 
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via 7 r. Last, it is clear that (j) maps f r , and only t r , to the fact bag. For property (Cornpat), 
it is clear that the restriction of n to any bag b of T is an isomorphism between b and 4>(b). 
For property (IN), for any element a € dom(T) it is clear that 7T|| N T( a j is an isomorphism from 
IN T (a) to IN^(7r(a)): IN T (a) consists of the union of the bag b a where a was introduced and the 
children of b a with which a is shared (i.e., all children, except at 6 r ), which corresponds exactly 
to the bags of J where 7r (a) occurs. For property (Surj), the surjectivity of $ is because J is 
unravelable, so all bags of J are reachable from the fact bag. □ 

This concludes the proof: we make the interpretation unravelable without loss of generality, 
unravel it with Proposition B.18, and Proposition B.17 and Lemma B.15 ensure that the result 
satisfies the required conditions. 

B. 4. Proof of Lemma 4.16: Treeification soundness 

We call a bad cycle in a conjunction of cr-atoms <h a Berge cycle of length > 2 or containing a 
higher-arity atom (following Definition 4.1). 

Let F be a <7g-fact, r be a FR[l]Hnl rule, and J be a as-interpretation. Assume that J is 
cycle-free except for SHR(X), and let W be the witness whose existence is guaranteed by this. 
Similarly to Lemma B.4, it is easily seen that this implies that X is non-looping except within 
the domain of the unshredding W / of W. 

Now, assume that J satisfies SHR(TRf(t)), and assume that X y= r. Let / be a mapping 
from the body of r to X that witnesses the violation. We consider the dependency t' (implied 
by r) obtained by identifying all variables of the body of r that are mapped to the same 
element by /. We can thus see / as a match of t' that maps all variables of the body of r' 
to distinct elements. If t' is a FR[i] Fnl rule, then it is in TR^(r) (taking x' = 0), so that if X 
violates t then it violates TR^(r), contradicting the fact that it is the unshredding of J which 
satisfies SHR(TRi?(r)) (as in Proposition 4.8). 

Hence, assume that r' is not a FR[l]Fnl rule, so that its body has a bad cycle. Because / 
maps all variables in the body of t' to distinct elements of X, the image of any bad cycle of the 
body of t' by / is a bad cycle of X. Hence, as X is non-looping except for W', any bad cycle of 
t' must be mapped by / to elements of dom(W / )- Now consider r" obtained from t' by setting 
x' to be the variables mapped to elements of dom(W / ), setting g that maps each variable x 
of x' to the variable z of F such that we have /(x) G (there is precisely one, as W is a 
witness of SHR(X), which we have defined to include the atoms P,(»)), and performing the 
construction g(r") as in Definition 4.10. The result t" is in FR[l] Fnl , as otherwise a bad cycle 
in it translates to a bad cycle in t' of elements not matched to dom(W / ), which, as we have 
seen, contradicts the fact that X is non-looping except within dom(W / )- So r" is in TR^(t), 
and / is also a match of t" that maps the frontier variable to the same element. Hence, as 
X |= TRi?(r), we have a contradiction of the fact that / witnesses a violation. 

This concludes the proof. 

C. Proofs for Section 5: Adding Functional Dependencies 

C.l. Proof of Theorem 5.1: QA is undecidable for FDs and single-head FR[1] rules 

Call FR[1] SH the class of single-head frontier-one rules. Recall the definition of the entailment 
problem (Definition A.l) and of UFD (Definition A.3). We will write rules of ID [1] and ID [2] 
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as in Definition A.5. 

By Lemma A.2, the entailment problem for FR[1] SH A UFD and ID[2] reduces to QA for 
FR[1]SH 

A UFD. so it suffices to show the undecidability of the former to show undecidability 
of the latter. We will do so by adapting the result of [Mit83], who showed that implication 
of ID[2] rules by UFD and ID[2] constraints is undecidable. We will need to consider a special 
form of the problem studied in [Mit83]: 

Definition C.l. The restricted UFD/ID[2] entailment problem is the entailment problem for 
UFD A ID[2] and ID[2] where the input is restricted so that there is only one relation R, and, 
for any ID[2] rule R a R b C R c R d in the input, the UFD R a —> R b holds in the input. 

We now state our variant of the undecidability result in [Mit83]: 

Theorem C.2. The restricted UFD/ID[2] entailment problem is undecidable. 

Proof. We recall the proof technique of [Mit83]. The proof gives a reduction to the entailment 
problem from the following undecidable problem: given a system of equations of the form 
x = y o z on functional monoids, decide if a certain equation xq = yo o zo is entailed by the 
system. 

This problem is reduced to the entailment problem in the following way. Given such a system, 
we create a relation R with one attribute R x per variable x , plus an extra attribute R a . We 
impose the UFD R a —» R x and the ID [1] rule R x C R a for each position R x of R (except R a ). 
This ensures that the projection of R to R a R x can be interpreted as the graph of a function. 
Now, equations of the form x = yoz can be understood as the corresponding assertions on the 
functions represented by R a R x , R a R y and R a R z , and Lemma 4 of [Mit83] shows that such an 
assertion can actually be enforced by a ID[2]-like constraint: R y R x C R a R z . Those constraints 
are not necessarily ID[2] constraints because we may have R x = R y . 

We observe that we can enforce that we always have x / j/ in such constraints by adding 
more equations. For every variable x, we replace all its occurrence in the equations by fresh 
variables x±,... ,x n , and we add the equations x\ = X 2 , • •., x n -i = x n . Clearly the resulting 
problem is equivalent to the original one, and the encoding of each constraint x = y o z is now 
an actual ID[2] rule. Similarly to Lemma 4 of [Mit83], we observe that the new equations of 
the form X{ = X { + 1 are equivalent to asserting R a R Xi C R a R Xi + 1 and R a R Xi + 1 C R a R Xi on the 
projections. 

We now observe that the implication problem of [Mit83] with the above restriction can in 
fact be assumed to be in the form of the restricted UFD/ID[2] problem, except that it features 
some ID [1] rules. Indeed, each of the ID[2] rules in the encoding of the equations x = y o z is 
of the form r : R y R x C R a R z , and the UFD constraint 4> : R a —> R z holds. It is clear that 
r A (j) |= 0 7 , where <f'\R v ^ R x . Indeed, any violation of <f' in an interpretation satisfying 
r implies by r the existence of a violation of (f. Hence, the problem is equivalent to the one 
where we add the UFDs R y —> R x for every equation x = y o z. For the equations of the 
form Xi = Xi- |_i, as R a —> R Xi and R a —> R x,+1 hold, the condition of the restricted UFD/ID[2] 
problem is also satisfied. 

The last step to reduce to the restricted UFD/ID[2] setting is to eliminate the ID[1] rules. We 
do this using a variant of Lemma A.6, where we encode each ID[1] rule r : RP C S q as the ID[2] 
rule R P R T ' 1 C R q R T ' 2 , where R T l and S T ’ 2 are fresh positions of R and S respectively, plus 
the UFD R p —> R T ' 1 so that the condition of the restricted UFD/ID[2] problem is respected. 
It is easily seen that this does not affect the rest of the proof: projecting away the additional 
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attributes or populating them with the same value as their determiner cannot violate any of 
these additional UFDs. □ 

What remains now is to show the following: 

Proposition C.3. There is a reduction from the restricted UFD/ID[2] entailment problem to 
entailment for UFD A FR[1] SH and ID[2]. 

Proof. Consider an instance of the restricted UFD/BID entailment problem: we are given a 
relation R, a set $ of UFDs, a set A of ID[2] rules, and the ID[2] rule r, and we ask whether 
A A |= t. 

Let n be the number of positions of R. We construct the relation S whose positions are S 1,1 
and S'*’ 2 for every position R 1 of R. We translate each UFD : R p —> R q of to the two 
UFDs (fi : S p — > S q ' 1 for i G {1,2}, letting <£' be the resulting UFDs on S. We translate the 
ID [2] rule r : R a R b C R c R d to the ID [2] rule t' : S a ’ 1 S b ’ 1 C S c,1 S d ' 1 . We now describe how 
each ID[2] rule of A is translated to FR[1] SH . 

Consider a I D[2] rule 6 : R a R b C R c R d . We create a first FR[1] SH rule 

(5i : Vx (S(x},...,x^,x{,...,x 2 ) -A 3y S(z{ } ..., 4, zj ,..., z 2 )) 

defined as follows: 

• z l is xl; 

• z l is 

• z b is vb 

• otherwise, z* is y*. 

We create a second FR[1] SH rule 

5 2 : Vx (5(x{,...,x^,x{,...,x 2 ) -> 3y 5(z{,..., 4, «i, • • •, 4)) 
defined as follows: 


• z l is x 2 a - 

• z l is x 2 a \ 

• z d ^ db 


• otherwise, z* is yf 

For instance, the ID[2] rule 6 : R X R 2 C R 3 R 4 would be encoded as: 


Si :Vx (S(x\,x\,x\,x\,x\,xl,xl,x\) -A 3y S(x\, y\, y\, y\, x\, 
S ‘2 :Vx (S(x\,xl,xl,xl,xj,xl,xl,xl) -A 3y S(y\,yl,x\,yl,x\, 


yhyhyl)) 

ylylyl)) 


Note that, by the condition of the restricted UFD/BID entailment problem, the UFD R 1 -a- R 2 
holds in <h. Hence, y\ in the head of the first rule must be matched to the same element as x\, 
and likewise for y\ in the second rule. 

We let A' be the result of this encoding of A, and we claim that <h A A |= r iff A A 7 |= rh 
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To this end, we first show that, for any ID[2] constraint 8 : R a R b C R c R d of A, with cf> : 

R a —> R b in T by the assumption of the restricted UFD/BID entailment problem, considering 
the translations fa, fa G d>' of 4>, and considering and d'i, 82 G A' as defined above, letting 
8 ' : S a, 1 S b ' 1 C S' c ’ 1 5‘ i ’ 1 be the intuitive ID[2] translation of 8 to S, the following entailment 
holds: 81 A 82 A 4>i A fa \= 8 '. In other words, our rewriting and 62 of 8 implies the 
straightforward rewriting 8 '. 

Indeed, consider an interpretation Z of 5 \A 52 A(t>iA(j) 2 - Consider a tuple t = ( u \,..., u\, u \,..., u 2 ) G 
S x We wish to show that it does not witness a violation of S'. By <5i, there exists a tu¬ 
ple G S x with v\ = u l a , v\ = u l a , and v\ = v\. As Z satisfies fa, 

as v\ = u\, we must have v\ = u\, so that v 2 = u\. Now, by S 2 , there exists a tuple 
t' = (rc{,..., w l n , w 2 , ..., w 2 ) G S x with iu 2 = v 2 , w\ = v 2 , and w l d = w 2 . Now, as Z satisfies 
fa, as w 2 = v%, we must have = v%. Putting it together, we have roj = v 2 a = u\, and 
w d = w b = v b = u b- U ence 5 t' witnesses that t is not a violation of 5'. This proves that, indeed, 
di A 82 A fa A fa |= S'. 

Let us now proceed with the proof of the fact that <h A A |= r iff A A' |= t', to show that 
the reduction is correct. Assume that A A' t'. Let Z be an interpretation of A' that 
violates t'. Let J be the projection of Z to the positions S’ 1,1 ,..., S n ’ 1 , formally: 

R J = {(a},..., al) | {a \,..., a, 1 ,, af,..., a 2 n ) G S 1 } 

Because Z satisfies d* 7 , Z clearly satisfies d>. By our previous observation, it is clear that, 
because Z satisfies A' and d>, J satisfies A. It is also clear that, because Z violates r', J 
violates r. So J witnesses that <h A A \/= r. 

Conversely, assume that T A A r, and let Z be a counterexample interpretation. We create 
J by constructing S as the product of R by itself: create the tuple (a, b) G S^ for every tuples 
a, b G R x . It is clear that J satisfies T 7 because Z satisfies <h (as the FDs are either within 
the positions S 7,1 or within the positions S' 7,2 ). For the same reason J still violates r 7 because 
Z did. We now check that J satisfies A 7 . Let 8 : R a R b C R c R d be a rule of A and show that 
J satisfies and 82 ■ For 5i, let t = (u, v) be a tuple of . By construction of J we have 
(u, u) G which witnesses that that F is not a violation of A. For ^ 2 , let t = (u, v) be a 

tuple of . By construction of J we have v G R x . As Z satisfies 8 , there is a tuple w G R x 
such that w c = v a and Wd = Vf,. By construction of J , we have (w, v) G , which witnesses 
that t is not a violation of 82 ■ Hence J satisfies A 7 , so it witnesses that <h 7 A A 7 \/= r 7 . 

This shows that our reduction is sound, and concludes the proof. □ 

We conclude the proof of Theorem 5.1 by combining Lemma A.2, Proposition C.3 and 
Theorem C.2. 

C.2. Proof of Lemma 5.6: FD-safety and cycle-freeness 

Let $ be a set of FDs on cr> 2 , let J be a us-hrterpretation, and assume that it is cycle-free 
and FD-safe except for a witness W (of some cg-fact F ). Note that that there is a slight abuse 
of terminology here relative to Definition 4.13: we mean that J is cycle-free except for F, and 
that W is a witness satisfying the conditions of the definition of being cycle-free. 

Let Z be the unshredding of J, and consider two tuples a and b in R x that violate an FD 
(/> of 4> (remember that this implies |i?| > 2). By our assumption that the unshredding W 7 of 
W satisfies <!>, it is not possible that both a and b are in R w . Let P be the positions of R 
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that are the determiner of (j), and R r be the position that <f> determines, so that a* = b t for all 
R l E P , but a r / b r . 

Consider the set S = {a.; | R l E P}. If S is not a singleton set, then, as \R\ > 2 and a / b, 
the image of the shredding of a and b creates a cycle in t/(SHR(Z)), which does not consist 
only of elements of W because a and b are not both in R w . This contradicts the fact that J 
should be cycle-free except for W. Hence, S is a singleton set. 

Accordingly, let a be the common element which is the cij for any R? E P. Now, considering 
the shredding of a and b in J' := SHR(Z), SHR(Z) is such that (t, a) and (■ t', a) are in Rf for 
all R l E P. As P is the determiner of a FD of <J>, SHR(Z) is not FD-safe except for W, because 
t. and t' cannot both be in doiri(W), otherwise a and b would be in R w . This contradicts the 
fact that J is FD-safe except for W 

C.3. Proof of Lemma 5.7: Unraveling with FDs 

We first assume without loss of generality that the FR[l] Hnl constraints have only unary or 
higher-arity relations in their head. Indeed, for any FR[l]Hnl 

rule r violating this condition, we 
can replace its head by a fresh unary atom U(x), where x is the frontier variable, and assert 
in the GC 2 constraints that U implies the head atom of r. 

We first define: 

Definition C.4. A proper guarded pair of a as-interpretation J is a pair {a, b} of distinct 
elements of dom(j7) such that a and b co-occur in a relation which is not in as\a. Note that 
if J satisfies wf (erg) then, for any guarded pair, either the pair only occurs in tuples for such 
relations, or the pair only occurs in tuples for relations of as\a. 

The proper bags of J are the bags induced by proper guarded pairs. 

Given a as-interpretation J and (a) € Elt* 7 , the arity-two immediate neighborhood IN^(a) 
of a in J is the restriction of IN' 7 (a) to the proper guarded pairs. 

We give a different name to the unravelings that we will create: 

Definition C.5. T = (W,E,b r ) is an FD-faithful unraveling of an interpretation J preserving 
a witness W given FDs <3? if it is an unraveling of J preserving W (recall Definition B.lf) 
such that there exists a homomorphism ir from dorn(T) to dom( l 7), and a mapping f> from 
W to the bags of J that maps b v to the fact bag, and maps no other bag to the fact bag. We 
require that: 

(Compat-P) 4> is compatible with ir: for any b € W such that f>(b) is a proper bag, 7T|d om (t>) 
an isomorphism between b and <f{b), and it is even the identity for b = b r ; 

(IN-2) for every a E dorn(T), tt' t is an isomorphism between IN^(a) and IN/(7r'(a)); 

MIM2 \ a ) 

(Surj-P) </> is surjective for proper bags except for W: for any proper bag b of J whose domain 
is not a subset of W, b has a preimage by 

(FD-S) T (seen as an interpretation) is FD-safe except for W; 

(Achieve) for any a G dorn(T), for any relation R of a >2, for any subset P of the positions of R 
which is not a strict superset of an FD determiner o/<h, ifir(a) is such that (t ', vr(a)) G R.f 
for some t' for all R l G P, then the same is true of a inT (seen as an interpretation) for 
some t. Further, unless P is exactly an FD determiner, letting S ■= IN T (t)\{Ri(t,a) \ 
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R l G P} and S' := IN J (t , )\{R i (t' ,7 r(o)) | R* G R}, 7T|5 is an isomorphism between S and 

S'. 

Intuitively, property (Achieve) is designed to preserve exactly what can be asserted by non¬ 
conflicting rules. Except in the case where the frontier variables are exactly a determiner, this 
includes the patterns of equalities between the “non-frontier” variables of the atom. We cannot 
preserve more, because we need to remain FD-safe. 

We modify the definition of unravelable interpretations to require that all elements of the 
interpretation occur in at least one tuple for a binary relation not in <7s\a, and that its bag 
graph is connected even when the non-proper bags are removed. This can be ensured without 
loss of generality as before, because the fresh binary relation used to ensure the condition is 
not in <ts \a. 

We must show the correctness of such unravelings: 

Proposition C.6. For any as-fact F, GC 2 constraints E, CQ q', FDs <h, and non-conflicting 
FR[1] constraints A, if J is an unravelable interpretation that satisfies E, SHR(A), wf(<jg), 
-i q', and has a witness W of F, and T is an FD-faithful unraveling of J preserving W, then 
T is an interpretation which is FD-safe except for W, it satisfies E, SHR(A), wf(as), and ->q', 
and W is still a witness of F in T. 

Proof. T is clearly FD-safe except for W by property (FD-S), and it satisfies ~>q' (by the 
homomorphism n). It satisfies E and wf(crg) by the same arguments as in the proof of Propo¬ 
sition B.17, noting that E and wf(crs) do not refer to the fresh relations of as, so it is sufficient 
to have isomorphisms between arity-two immediate neighborhoods, and to have surjectivity of 
7 r for the proper bags only. The harder part is to show that SHR(A) is satisfied. 

Consider any r G A, and consider a match / of the body of SHR(r) in T, and let a be the 
element of dorn(T) to which the frontier variable of r is mapped. Consider the image of / by 
the homomorphism 7r in J. As J satisfies SHR(r), this implies that the element a' '■= n (a) in 
dom( l 7) is such that the head of SHR(r) can be matched to J with a homomorphism mapping 
the frontier variable to a'. Now r is a single-head dependency, and we made the assumption 
that heads were either unary or higher-arity. If the head of r is unary, then, so is the head 
of SHR(r), and, considering the restriction of 7r to any proper bag containing a in T (such 
a bag exists as we assumed that the interpretation is unravelable), as this restriction is an 
isomorphism, we conclude that the unary head atom to which the head of SHR(r) is matched 
in J also has a match in T, so that / does not witness a violation of SHR(r). Hence, let us 
assume that the head of r is higher-arity, and let R be the higher-arity relation. 

This means that there is a subset P of positions of R (namely, the set of positions of the 
head of r where the frontier variable occurs), and there is t! G dom( l 7), such that ( t' , a') G R.f 
for all R l G P. We know by the non-conflicting condition that P is not a strict superset of 
a determiner of an FD in <h. If P is exactly a determiner of an FD in <I>, property (Achieve) 
ensures (t, a) G Rf for all R l G P for some t G dorn(T). Now, by the non-conflicting condition, 
all variables in the head of r at positions not in P are existential variables and it is their only 
occurrence. Hence, the fact that T satisfies wf (er) ensures that the head of SHR(r) has a match 
in T mapping the frontier variable to a, so that / does not witness a violation of SHR(r). 

If P is not a determiner of an FD, then property (Achieve) ensures that (t, a) G Rf for all 
R l G P for some t G dorn(T) and IN T (t)\{i?j(t, a) j R l G P} and IN'^(t / )\{Rj(f / ,a') | R l G P} 
are isomorphic. This implies that the head of SHR(r) has a suitable match in T so that / does 
not witness a violation of SHR(r); indeed, seeing the tuples ( t,a”) and ( t',a'") in Rj and R,f 
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as ground i?-atoms A\ and A 2 , the head atom A of r has a homomorphism to A 2 mapping 
the frontier variable to a', and we know that the elements at positions of A\ and A 2 which 
are not in P have the same equalities, and that A contains the frontier variable at positions P 
and other variables at the other positions; so A also has a homomorphism to A 2 mapping the 
frontier variable to a. Hence, T satisfies A. 

This justifies that T satisfies all the required constraints, concluding the proof. □ 

We now describe the FD-faithful unraveling process: 

Proposition C.7. For any fact F, for any set $ of FDs on <r> 2 , for any unravelable inter¬ 
pretation J of wf(crg) and witness W of F in J such that the unshredding of W satisfies <f>, 
there is an FD-faithful unraveling T of J preserving W. 

Proof. We modify the proof of Proposition B.18 in the two ways. 

The first modification is that, whenever we unravel on a bag b where the element a was 
introduced, and (vr(a)) £ Elt^, we deal differently with the non-proper bags adjacent to (j)(b ) 
in the bag graph of J. We now give details. 

Let B be the set of non-proper bags in the bag graph of J that share a with 4>(b), to which 
we add <f>(b ) itself if it is non-proper. We consider all subsets P of positions of all higher- 
arity relations R such that (t',ir(a)) £ Rf for some t’ for all R i £ P, and P is not a strict 
superset of a determiner of an FD: we say that n (a) occurs at P. For any such P, we say that 
b’ (necessarily in B) realises P if b' witnesses that 7r(a) occurs at P. We add the following 
children (for non-proper bags; for proper bags we do as before): for every such P which is not 
an FD determiner, for every bag b' of B that realizes P, create one child of b for b' containing 
the tuples that witness that vr(a) occurs at P, and unravel on this child; for every such P which 
is an FD determiner, and for which it not already the case that (t, a) £ Rf for some t for all 
R l £ P, pick one bag b' of B that realizes P, create one child of b for b' containing the tuples 
that witness that 7r(a) occurs at P, and unravel on this child. 

In other words, informally, for the non-proper bags, we look at all sets of positions of higher- 
arity relations in which 7r(a) occurs, keeping only those which are not a strict superset of an FD 
determiner. For those which are not FD determiners, we trigger-happily unravel on every bag 
where vr(o) occurs at these positions. For those which are FD determiners, we only unravel if 
a does no already occur at those positions, and then we choose only one representative bag. In 
all cases, if the current bag <f(b ) is non-proper, we also include it in the bags that we examine: 
this may mean that we have an infinite chain in T of copies of this bag, but this is not a 
problem, as T is infinite. Also, in all cases, when unraveling on a non-proper bag, we only 
copy the tuples witnessing that 7r(a) occurs at the relevant positions; if 7r(a) occurred at other 
positions, we do not copy such tuples (we will complete the other positions when unraveling 
at the next step, see below). 

The second modification is that, when we unravel on a bag b where the element t was 
introduced (and the other element is a'), and we have (tt( t)) ^ Elt^ (so that, as J satisfies 
wf(<7s), (vr(t)) £ Af, for some R £ cr> 2 ), we compare the tuples of b and of 4>(b). Indeed, by 
the first modification, it may be the case that some tuples of cf{b ) were not copied in b. We 
let P' be the positions of R such that (f, a') ^ R\ but (7r(f), vr(o')) € Rf. In addition to the 
neighbors of <f>(b ) in the bag graph that we would ordinarily consider, we consider a “virtual” 
neighbor, a bag containing the tuples (7r(f), vr(o')) in its interpretation of Ri for R l £ P', on 
which we also unravel as usual. 

In other words, informally, when unraveling non-proper bags representing a higher-arity 
ground atom where one element of the atom, introduced at the parent bag of b in T, occurs 
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at multiple positions, and we have only kept a subset of these positions, then the missing 
occurrences are seen as another bag, that we will also copy (and which may make us go back, 
in G , to the bag for the parent of b in T). 

To give an example, assume that (it) € , we are unraveling on some element it, and J 

contains the shredding of the -R-tuples ( u,u,u,u,v ) and (u,u,v,v,v). Say R 1 R 2 and R 2 R 3 
are the determiners of the FDs in $ on R. The unraveling will create the shredding of the 
following ground atoms: 

• for R 1 , the i?-tuples (it, u\, iti, u\, v\) and (it, U 2 ,V 2 ,V 2 ,V 2 ) - note how the other positions 
where u occurs contain a fresh copy of it, created when unraveling on the virtual neighbor; 

• for R 2 , the i?-tuples ( 113 , it, 113 , 1 x 3 , U 3 ) and R ( 114 ,it, 114 , 114 , 114 ); 

• for R 3 , the i?-tuple (its, its, it, 115 , 115 ); 

• for R 4 , the i?-tuple (uq,uq,uq,u,vq)] 

• for R l R 2 , the -R-tuple (it, it, 117 , 117 , 117 ) - note that only the first tuple was used as witness, 
and indeed using also the second would have violated FD-safety; 

• for R 2 R 3 , the A-tuple R(us, it, it, its, ^s)- 

We show correctness. Properties (Compat-P), (IN-2) hold for the same reasons as in the 
original construction, and (Surj-P) holds because the Gaifman graph of J is connected using 
a fresh binary relation which we consider as a proper bag. FD-safety (property (FD-S)) holds 
initially because the copy of the witness W contains no shredding of higher-arity tuples except 
the ones that occur in the unshredding of W, which satisfies $. We show that the property is 
preserved during the unraveling, by observing that, whenever we create atoms of crs\u, of the 
form ( t , a) for a relation Ri, for R l in a set P of positions of R, then either a is fresh in T, or 
P does not contain an FD determiner, or P is an FD determiner but a does not occur at these 
positions already. 

We now check that property (Achieve) is satisfied. For any a € dom(T), consider the bag b 
of T where a was introduced, and let us check the condition. The first part of the condition is 
clear by our construction: for any such P, there is a child of b witnessing that a occurs at the 
right positions in T. The second part of the condition holds because, when P is not an FD 
determiner, we create one child for each bag that realizes P, and unravel on this child. □ 

We conclude as in Lemma 4.14: we make the original interpretation unravelable without loss 
of generality, we apply Proposition C.7, and conclude by Proposition C .6 and Lemma B.15. 
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